SVM + PCA

Part 1

Necessary imports

Task 1

Display and normalization routines

instead of:

noise=int(idx[:-2])

I used formula below:

noise=int(idx[-2:0:-1])/int(idx[2:])

since, for the first case, the generated dataset was too noisy, hence unclassifiable.

Task 2

plotting routines

parameters

Conclusions

Part 2

Task 1

Dataset description:

Download dataset

Task 2

Dataset exploratory analysis

Count of missing values per column

Convert data to numbers

Task 3

Split obtained dataset into test and train dataframes, taking stratification into consideration

Normalization of dataset

Task 4

Additional imports

Find the best hyperparameters

Result of grid-search cross-validator - best hyperparameters

Receiver operating characteristic curve

Confusion matrix of performed classification

AUC computation

Conclusions and answer to question - Which values should computed confusion matrix aim to?:

Task 5

PCA transoformation (it is expected at least 90% of variance)

Fit new, transformed data to find the best hyperparameters

Result of grid-search cross-validator - best hyperparameters

Receiver operating characteristic curve

Confusion matrix of performed classification

AUC computation

Conclusions and observations